Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences

نویسندگان

  • Bin Fu
  • Ming-Yang Kao
  • Lusheng Wang
چکیده

We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1g2 · · · gm is a string of m characters. Each background sequence is implanted into a probabilistically generated approximate copy of G. For an approximate copy b1b2 · · · bm of G, every character bi is probabilistically generated such that the probability for bi = gi is at most α. In this paper, we give the first analytical proof that multiple background sequences do help with finding subtle and faint motifs. This work is a theoretical approach with a rigorous probabilistic analysis. We develop an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large. Specifically, we prove that for α < 0.1771 and any constant x ≥ 8, there exist constants t0, δ0, δ1 > 0 such that if the length of the motif is at least δ0 logn, the alphabet has at least t0 characters, and there are at least δ1 logn0 input sequences, then in O(n3) time our algorithm finds the motif with probability at least 1− 1 2x , where n is the longest length of any input sequence and n0 ≤ n is an upper bound for the length of the motif.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

Genetic Algorithm Based Probabilistic Motif Discovery in Multiple Unaligned Biological Sequences

Many computational approaches have been introduced for the problem of motif identification in a set of biological sequences, which are classified according to the type of motifs discovered. In this study, we propose a model to discover motif in large set of unaligned sequences in considerably minimum time using genetic algorithm based probabilokistic Motif discovery model. The proposed algorith...

متن کامل

Efficient Algorithms for Model-Based Motif Discovery from Multiple Sequences

We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1g2 . . . gm is a string of m characters. Each background sequence is implanted a randomly generated approx...

متن کامل

Clustering sequence sets for motif discovery

Most of existing methods for DNA motif discovery consider only a single set of sequences to find an over-represented motif. In contrast, we consider multiple sets of sequences where we group sets associated with the same motif into a cluster, assuming that each set involves a single motif. Clustering sets of sequences yields clusters of coherent motifs, improving signal-to-noise ratio or enabli...

متن کامل

Genetic Algorithm Based Probabilistic Motif Discovery in Unaligned Biological Sequences

Finding motif in biosequences is the most important primitive operation in computational biology. There are many computational requirements for a motif discovery algorithm such as computer memory space requirement and computational complexity. To overcome the complexity of motif discovery, we propose an alternative solution integrating genetic algorithm and Fuzzy Art machine learning approaches...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Discrete Math.

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2009